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ABSTRACT 

This  paper  describes  the  trends  and  research  activities  in 
attempts  to  make  compiling  systems  easier  to  describe,  build  and 
maintain,  and  to  increase  considerably  their  flexibility  in  relation 
to  the  user.   A  conceptual  "compiler -compiler"  is  described  in  detail. 
Several  of  the  theoretical  and  practical  problems  related  to  its 
construction  are  discussed  and  compared  with  actual  systems  in  varying 
stages  of  implementation.   Suggestions  are  made  for  future  efforts  in 
this  area  so  that  hopefully  some  day  the  user  may  be  able  to  describe 
the  linguistic  features  he  prefers  to  use  to  describe  his  problem.   An 
extensive  bibliography  is  included. 


INTRODUCTION 

The  problem  of  communicating  more  easily  with  computers  has 
occupied  the  attention  of  hundreds  of  persons  over  the  past  several  years. 
Some  of  these  persons  seriously  suggest  the  use  of  English  or  natural 
languages  for  use  in  programming.   The  pros  and  cons  of  this  idea  have 
been  much  discussed  in  the  literature  [31]  and  will  not  be  pursued  further 
here.   Instead  we  shall  discuss  attacks  on  the  problem  from  the  standpoint 
of  making  large  strides  in  improving  the  communicability  of  present  computer 
languages . 


-1- 


1.   EARLY  EFFORTS  AND  GENERAL  CONCEPTS 


The  first  step  in  improving  communication  with  machines  was  the 
assembly  languages.   This  was  a  significant  step  in  the  right  direction, 
but  still  much  too  detailed—since  problems  requiring  over  10,000  assembly 
instructions  are  increasingly  common.   At  about  500  man-hours  per  1000 
instructions,  the  time  and  expense  in  the  programming  of  such  problems  is 
considerable.   The  next  step  was  procedure -oriented  languages  such  as  FORTRAN. 
Although  a  single  statement  in  FORTRAN  may  replace  several  lines  of  assembly 
code,  the  procedure -oriented  language  is  often  a  handicap  in  describing  an 
algorithm  not  foreseen  by  the  language  designer. 

Most  of  us  working  in  this  area  are  fundamentally  optomistic . 
We  view  the  computer  as  shown  in  Figure  1.   We  act  as  though  if  we  can 
specify  a  problem  precisely,  then  it  is  usually  possible  to  write  a  computer 
program  to  carry  out  the  details.   Our  input  is  the  specification  of  the 
problem  and  the  output  is  the  desired  solution.   This  is  roughly  the  view 
of  a  computer  we  would  like  a  user  to  have.   Of  course,  the  user  would  be 
concerned  that  the  solution  was  appropriate  to  the  problem.   As  the  more 
sophisticated  users  would  say,  the  solution  must  represent  the  result  of  an 
appropriate  algorithm  operating  on  a  machine  of  finite  precision.   But  the 
users  would  not  be  concerned  with  the  linguistic  details  by  means  of  which 
the  computer  was  able  to  interpret  the  problem  in  the  first  place.   These 
details  are  the  concern  of  this  paper. 

In  providing  the  user  with  assembly  languages  such  as  SAP  and 
AUTOCODER,  or  with  procedure-oriented  languages  such  as  FORTRAN,  ALGOL, 
and  COBOL,  we  also  provide  an  operating  computer  program  known  as  an 
assembler  or  compiler  respectively,  or  more  generally,  as  a  translator  or 
processor.   We  shall  use  the  term  processor  throughout  this  paper  to  denote 
the  operating  system.   Usually  the  processor  operates  on  the  same  machine 
as  the  specified  procedure.   However,  this  is  not  necessary.   For  generality, 
therefore,  we  will  refer  to  the  machine  effecting  the  translation  as 
computer  T  and  the  machine  effecting  the  execution  as  computer  E. 
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Figure  1.   User's  View  of  a  Computer 
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It  is  desired  to  make  available  to  the  user  a  more  powerful  (or 
at  least  more  useful)  language  in  which  he  can  discuss  his  problem  with  the 
computer.   One  of  the  greatest  stumbling  blocks  to  providing  such  a  facility 
is  the  writing  of  a  suitable  set  of  processors.   Early  efforts  to  automate 
the  writing  of  processors  were  made  by  Irons  [23]  and  by  Brooker  and  Morris 
[3?  ^)    5]«   The  processors  which  automatically  produce  processors  are 
frequently  referred  to  as  "compiler-compilers."  The  input  to  such  a  master 
processor  may  consist  of  a  description  of  the  translating  machine,  a  des- 
cription of  the  language  to  be  translated,  and  a  description  of  the  machine 
on  which  the  algorithm  is  to  be  executed.   The  output  consists  of  a  processor 
appropriate  for  use  on  the  computer  effecting  the  translation. 

The  methods  by  which  this  general  scheme  has  been  attempted  are 
quite  varied.   In  some  systems  an  interpretative  routine  accepts  the  speci- 
fication of  a  language  and  uses  the  specification  to  interpret  and  execute 
the  input  language  procedure;  in  other  systems  several  "passes"  of  the 
input  string  or  its  equivalent  may  be  required.   Generally  there  is  little 
agreement  on  the  meaning  of  the  term  "compiler-compiler,"  and  consequently 
this  term  has  fallen  into  a  state  of  disuse.   The  common  idea  is,  however, 
that  instead  of  having  to  write  a  translator  for  each  particular  language 
and  each  particular  machine,  we  write  specifications  of  these  languages  and 
these  machines.   For  several  years  it  has  been  argued  that  in  this  way  one 
can  describe  m  languages  and  n  machines  in  m  +  n  sets  instead  of  writing 
m  times  n  translators.   Unfortunately,  the  situation  has  proved  not  quite 
so  simple.   There  are  important  problems  neatly  hidden  in  such  a  simple  des- 
cription, but  before  discussing  a  few  of  these,  let  us  examine  a  simple 
theory  and  some  of  the  current  efforts . 

A  processor  can  be  defined  as  a  function  for  the  transformation 
of  an  input  string  given  in  one  language  to  an  output  string  in  another  [6]. 
The  parameters  of  this  function  are: 

A.  Source  language  of  the  input  string; 

B.  Target  language  of  the  output  string; 

C.  Own  language  of  the  processor  itself; 

D.  Other  parameters  describing  the  algorithms  used  in  the 
translation,  the  efficiency  of  the  processor,  optimization 

.of  the  output  string,  etc. 
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The  most  important  function  of  a  processor  is  that  of  changing  the 
language  representation  of  a  problem,  particularly  from  its  source  language 
into  a  machine  language.   Therefore,  if  no  regard  is  given  to  the  other 
parameters,  we  may  consider  a  processor  as  an  element  of  three  dimensional 
language  space.   Such  an  element  may  be  represented  as  a  triple: 


Source  Target  Own 

language   ,  language   ,  language 

of  input  of  output  of  processor 


Furthermore  if  we  make  the  generalization  of  regarding  the  execution 
of  an  algorithm  as  a  processor  carrying  the  language  of  the  problem  or  data 
into  the  language  of  the  solution,  we  see  that  the  required  machine  input  for 
the  algorithm  is 

(Problem,  Solution,  Machine  E). 

This  input  is  generated  by  the  processor 

(Procedure  Language  L,  Machine  E,  Machine  T) 

which  operates  on  the  source  program 

(Problem,  Solution,  Procedure  Language  L). 

For  brevity,  we  shall  indicate  this  operation  as  follows: 

input  processor  *  operating  processor  -*  output  processor, 
namely, 

(P,S,L)  *  (L,ME,MT)  -  (P,S,ME). 

This  operation  is  illustrated  in  Figure  2. 
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COMPUTER    T 


(P.S.L)    *    (L.ME.MT) *(PfS.ME) 


Figure  2.   The  Compiling  of  a  Program 
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The  description  of  the  compiler-compiler  process  is  somewhat  more 
complex.   Let  us  suppose  we  have  a  suitable  metalanguage  U  and  that  our  master 
processor  operates  on  Machine  CC.   The  input  described  previously  then  consists 
of  the  processors. 

(U,  MT,  U), 

(L,   U,  U), 
and 

(U,  ME,  U). 

The  compiler-compiler  itself  consists  of  a  processor  and  a  monitor.   The 
processor  is 

(U,  MCC,  MCC). 

The  monitor  has  the  ability  to  organize  the  operational  steps  as  shown  in 
Figure  3  and  described  below: 

A.  First  the  compiler-compiler  translates  the  description  of 
Machine  T  into  the  language  of  Machine  CC: 

(U,  MT,  U)  *  (U,  MCC,  MCC)  -  (U,  MT,  MCC) 

The  processor  resulting  from  this  step  is  then  used  as 
the  operating  program  in  the  next  two  steps. 

B.  This  processor  translates  the  specifications  of  the  procedure 
language  from  the  metalanguage  U  to  the  language  of  Machine  T: 

(L,  U,  U)  *  (U,  MT,  MCC)  -  (L,  U,  MT) 

C.  Likewise  it  translates  the  description  of  Machine  E  from 
the  metalanguage  U  to  the  language  of  Machine  T: 

(U,  ME,  U)  *  (U,  MT,  MCC)  -*  (U,  ME,  MT) 
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COMPUTER    CC 


NtTOR 


(U,  MT,  U)  *  (U,  MCC ,  MCC)  -> (U,  MT,  MCC) 
(L,  U,  U)*(U,  MT,  MCC)->(L,  U,  MT) 
(U.ME,  U)*(U,  MT,  MCC) ->(U, ME,  MT) 
(L,  U,  MT)  •  (U,  ME,    MT)  ->  (L,  ME,  MT ) 


Figure  3.   Operations  of  a  Compiler-Compiler, 
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The  process  is  now  essentially  complete  although  it  may  be 
objected  that  we  have  not  produced  the  required  processor  (L,  ME,  MT). 
However,  the  processors  produced  in  steps  B  and  C,  when  combined,  are 
equivalent  to  (L,  ME,  MT),  for  they  each  operate  on  Machine  T;  the  first 
translates  the  procedure  language  L  to  the  metalanguage  U  and  the  second 
translates  the  metalanguage  U  to  the  language  of  Machine  E.   The  solution, 
then,  is  to  run  these  two  processors  sequentially  on  Machine  T.   It  is  con- 
venient to  represent  the  equivalence  of  these  processes  as  follows: 

(L,  U,  MT')  •  (U,  ME,  MT)  -*  (L,  ME,  MT) 

There  are  some  very  subtle  questions  that  are  neatly  avoided  in 
this  simple  presentation.   For  instance,  is  it  possible  to  specify  a  meta- 
language U  adequate  for  the  "cask?   If  so,  is  it  possible  to  describe  the 
algorithms  necessary  to  effect  the  translations?   If  so,  will  the  language 
specifications  and  machine  descriptions  be  sufficiently  simple  to  be  useful? 
The  answers  are  not  obvious,  but  the  author  believes  that  current  research 
efforts  will  soon  yield  conclusions  that  are  essentially  affirmative. 

An  interesting  and  important  generality  of  this  description  is 
seen  if  the  point  of  view  is  shifted  slightly.   Suppose  we  regard  the  third 
element  of  our  input  to  the  compiler-compiler  as  a  description  of  the 
language  of  Machine  E.   Then  we  could  replace  it  by  the  description  of  some 
other  language.   In  this  general  sense,  we  see  that  our  concept  includes  the 
facility  for  the  construction  of  natural  language  translators. 

Although  the  problems  of  natural  language  translation  are  qualitatively 
similar  to  the  problems  of  mechanical  language  translation,  they  are  con- 
siderably more  complex.  We  shall  exclude  these  specific  problems  from  our 
consideration,  but  we  shall  not  exclude  the  related  techniques. 

Other  examples  of  translating  from  one  language  to  another,  would 
include  translations  of  one  programming  language  to  another,  algebraic 
simplification,  the  algebraic  solution  of  problems,  algebraic  differentiation, 
integration,  and  so  forth.   Hereafter  we  shall  refer  to  these  languages  as 
the  input  language  and  output  language  respectively. 
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2.   THEORETICAL  .PROBLEMS 


There  are  other  problems  between  the  definition  of  a  compiler  to 
write  all  other  compilers  and  its  practical  realization.  Not  the  least  of 
these  is  a  basic  understanding  of  language  grammars.   A  few  proposals  have 
appeared  in  the  literature ,  mostly  following  the  lead  of  Chomsky's  work  [11] 
on  natural  language  grammars.   Although  this  work  is  progressing,  a  suitable 
mathematical  model  is  still  somewhat  elusive.   It  is  not  clear,  on  the  basis 
of  these  proposals,  how  to  design  a  metalanguage  U  adequate  for  the  descrip- 
tion of  the  input  and  output  languages.   For  a  limited  class  of  languages, 
Backus -Naur  Form  has  been  shown  quite  suitable  for  a  description  of  the 
syntax,  but  even  for  this  limited  class  a  general  method  for  the  descrip- 
tion of  the  semantics  remains  elusive. 

The  problem  of  describing  semantics  is  usually  circumvented  in 
one  way  or  another,  and  no  general  theory  of  semantic  descriptions  has  been 
published.   The  usual  method  of  describing  the  semantics  is  to  specify 
directly  that  a  particular  syntactic  construct  (perhaps  within  a  specified 
context)  in  the  input  language  has  the  same  meaning  as  another  syntactic 
construct  in  the  output  language.   Thus  the  interpretation  from  the  input 
language  to  the  output  language  is  specified  directly.   Herein  arises  the 
gray  area  that  makes  it  difficult  to  distinguish  meaningfully  between  syntax 
and  semantics. 

Besides  the  problem  of  adequacy  remains  the  problem  of  ambiguity. 
It  is  not  known,  for  example,  how  to  synthesize  even  a  phrase  structure 
grammar  for  a  programming  language,  given  the  precedence  relations  of  its 
operators  [l6].   It  is  known  that  the  general  question  of  whether  a  given 
phrase  structure  grammar  is  ambiguous  is  undecidable  [7>17>22].   Gorn  [21] 
published  a  method  for  the  detection  of  such  ambiguities,  but  he  admits  that 
the  method  is  of  no  practical  importance.   Perhaps  the  greatest  encourage- 
ment on  the  ambiguity  problem  arises  from  a  paper  by  Wirth  and  Weber  [3^]. 
In  this  paper  is  described  the  sufficient  conditions  for  a  precedence  phrase 
structure  grammar ,  and  a  proof  is  given  that  such  a  grammar  is  unambiguous. 
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The  class  of  precedence  phrase  structure  languages  includes  regular 
languages  but  obviously  not  the  full  set  of  phrase  structure  languages,  for 
it  does  not  include  any  ambiguous  phrase  structure  languages.   It  is  not 
apparent  that  all  unambiguous  phrase  structure  languages  can  be  written  as 
precedence  languages,  nor  has  any  counter  example  been  published. 
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3.   PRACTICAL  PROBLEMS 


Along  more  practical  lines,  the  compiler-compiler  is  itself  a 
sophisticated  compiler  of  the  metalanguage  U,  and  the  programming  of  such 
a  tool  can  be  expected  to  involve  much  time,  effort  and  planning.   If  it 
were  not  for  this  factor,  the  experimental  use  of  such  programming  systems 
would  provide  a  greater  quantity  of  data  to  assist  in  the  theoretical 
development.   As  experience  is  gained  in  these  systems,  both  better  systems 
and  more  adequate  theories  can  be  expected. 

Is  the  effort  of  constructing  such  a  tool  worthwhile?  The  practical 
reason  for  wanting  a  compiler-compiler  is  to  provide  a  tool  for  the  develop- 
ment of  new  programming  languages .   An  important  by-product  will  be  greater 
understanding  of  the  structure  and  role  of  language. 

Without  the  use  of  a  compiler-compiler,  or  similar  tool,  every 
innovation  in  programming  languages  must  involve  the  programming  effort  of 
a  new  compiler  or  at  least  the  specialized  work  of  modifying  an  existing 
compiler.   New  programming  languages  are  necessary  because  the  user  finds 
that  present  languages  are  not  really  suited  to  solve  his  problems,  or  else 
they  have  such  complex  syntactic  and  semantic  structures  that  they  are 
difficult  to  learn  and  use:   in  brief,  because  they  do  not  speak  the  language 
of  his  particular  speciality.   But  this  is  only  part  of  the  story.   Time 
sharing,  multiprogramming,  multiprocessing,  and  visual  displays  are  adding 
new  dimensions  to  the  processing  capability  of  hardware.   New  applications, 
such  as  processing  radar  and  weather  information,  are  giving  rise  to  new 
techniques  in  picture  processing,  parallel  processing,  and  pattern  recognition, 
New  languages  are  needed  for  the  expression  of  complex  systems  in  these  new 
environments,  and  to  represent  algorithms  that  can  be  executed  efficiently 
by  machines  with  these  capabilities.   Furthermore,  in  these  new  application- 
areas  there  are  very  few  experienced  programmers  who  might  suggest  the 
language  constructs  which  would  prove  most  useful- -as  was  possible  in  the 
early  days  of  FORTRAN.   Many  of  these  languages  will  have  to  develop  by  trial 
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and  error:   am  impossibly  time  consuming  task  without  at  least  a  primitive 
compiler-compiler.   Last,  "but  not  least,  we  must  develop  the  metalanguage  U 
which  is  the  input  to  the  compiler-compiler  itself. 

Now  these  problems  could  be  avoided  if  we  could  instead  design  a 
universal  programming  language.   The  method  for  doing  this,  described  by 
Burkhardt  [6],  has  proved  particularly  unfruitful.,   Furthermore,  an  existing 
compiler-compiler  would  be  a  significant  assistance  in  the  evolutionary 
approach  he  describes. 
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4.    SYSTEMS  AND  METHODS 

Burkhardt  [6]  has  described  several  systems  among  which  were 
UNCOL  [2,35],  CLIP-JOVIAL  [1,12,24,25,33],  and  TGS  [9,10].   The  UNCOL  and 
CLIP-JOVIAL  systems  present  their  greatest  problem  in  the  design  of  suitable 
intermediate  language  or  metalanguage.   The  TGS  system  is  a  continuing 
project  of  Computer  Associates,  Inc. 

Several  more  recent  systems  have  been  devised  and  are  in  varying 
stages  of  implementation.   Each  of  these  systems  attempts  to  reach  a  solution 
by  some  technique  of  separating  the  :,syntax"  from  the  "semantics."  A  brief 
description  of  five  such  systems  follows. 

A.   Formal  Semantics  Language  --  ''FSL." 

The  FSL  system  developed  by  Feldman  [1,3,14]  is  based  upon 
recognition  of  Floyd  productions  [l8]«   The  compiler-compiler  consists  of 
essentially  two  pieces,  one  to  accept  Floyd  productions  and  the  other  to 
accept  the  Formal  Semantics.   The  system  can  be  catagorized  as  a  generator 
for  a  single  pass  bounded  context  syntax  directed  compiler  [15] •   The  meta- 
language allows  for  the  description  of  actions  to  be  taken  at  compile  time 
or  run  time,  and  the  distinction  between  the  time  of  these  actions  must  be 
carefully  considered  in  writing  the  metalanguage.   The  bounded  context 
analysis  is  preceded  by  a  lexical  analysis  [8]  known  as  the  subscan  system 
that  treats  identifiers,  operators,  delimiters,  and  punctuation  marks  each 
as  units  of  the  input  string.   The  input  string  is  then  placed  into  a  push 
down  stack  for  the  reductive  analysis  [3^].   The  bound  of  the  analysis  is  the 
five  units  at  the  top  of  this  stack.   Each  set  of  one  to  five  such  units 
must  be  concatenated  into  another  set,  not  larger  than  the  first  set  nor 
larger  than  three  units,  but  the  resulting  set  may  contain  metalinguistic 
elements  not  part  of  the  input  string,  such  as  entities  called  "arithmetic 
expression,"  "declaration,"  or  "program  label."  As  each  concatenation 
takes  place,  object  code  may  be  generated,  other  compiler  and  data  areas 
may  be  altered,  and/or  additional  units  of  the  input  string  may  be  placed 
on  the  stack,  as  the  analysis  demands.   The  order  of  the  analysis  is  strictly 
controlled  by  the  metalinguistic  description. 
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A  complete  FSL  program  for  a  Formula  ALGOL  [28]  compiler  is  in 
operation.   Thus,  although  this  system  has  some  important  limitations,  it 
is  functioning  usefully. 

B.  COGENT 

The  COGENT'  system  developed  by  Reynolds  [29,30]  is  based  upon  an 
analysis  of  Backus-Naur  Form  [27]  productions.   Coupled  with  each  production 
is  a  semantic  routine  to  be  executed  whenever  that  production  is  recognized 
as  part  of  the  input  string.   The  semantic  routine  is  written  in  an  ALG0L- 
like  language.   These  two  styles,  BNF  and  ALGOL,  together  form  the  structure 
of  the  COGENT  metalanguage. 

The  characteristics  of  the  input  language  L  and  the  output 
language  X  are  each  described  in  COGENT.   The  action  of  the  operating 
processor  is  to  construct  a  translator  to  translate  from  L  to  X.   Within 
the  class  of  languages  that  can  be  described  by  COGENT',  this  program  is  a 
very  powerful  tool.   Reynolds  has  used  the  system  to  carry  out  algebraic 
manipulations  [29]  and  it  has  been  given  serious  consideration  for  use  in 
translating  CDC  3600  FORTRAN  programs  into  a  form  suitable  for  use  on  IBM 
Systems/360  [32].   The  chief  drawback  of  this  system  seems  to  be  its  total 
commitment  to  a  large  sized  fast  store  on  the  CDC  360O. 

C.  The  TMG  System 

This  system,  developed  by  McClure  [26]  is  basically  a  system  for 
describing  syntax  directed  compilers,  with  some  interesting  differences  to 
make  it  easier  to  handle  errors  and  declarative  information.   The  system 
is  designed  to  make  it  easy  to  construct  a  simple  one-pass  translator  for 
some  specialized  language.   It  has  been  used  to  construct  compilers  for 
FORTRAN  and  PL/I,  as  well  as  for  a  simple  data  format  conversion.   The 
most  serious  weakness  of  the  system  is  inefficient  object  code  in  the 
output  (machine)  language. 

D.  GARGOYLE 

As  the  TGS  system,  the  GARGOYLE  system  developed  by  Garwick  [19,20] 
is  an  attempt  to  generalize  the  methods  of  compiler  construction  so  as  to 
describe  a  very  efficient  operating  system  capable  of  producing  efficient 
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output.   The  system  essentially  consists  of  basic  pieces  of  compiler  logic 
and.  linkage  mechanisms,  which  provide  a  convenient  environment  for  the 
specification  of  the  compilation  process.   The  compiler  description  is 
presented  to  the  system  as  though  it  were  a  three-  address  machine.   The 
syntax  of  the  language  L  is  expressed  using  the  built-in  logic  and  the 
semantics  is  expressed  in  a  small  subset  of  ALGOL.   The  system  allows  for 
local  and  global  variables .   One  of  its  unique  and  interesting  features 
is  the  storage  assignment  algorithm  for  the  construction  of  several  inde- 
pendent stacks.   According  to  this  algorithm,  the  stacks  are  allocated 
storage  space  evenly  over  the  remaining  portion  of  the  core  memory.   If  a 
stack  attempts  to  exceed  its  allocated  space,  the  stacks  are  dynamically 
relocated  so  that  the  available  storage  space  is  split  among  the  various 
stacks  nearly  in  proportion  to  their  respective  growth  since  the  last  previous 
storage  allocation.   Special  provision  is  made  to  prevent  zero  allocation 
to  arrays  which  per  chance  have  had  zero  growth.   Another  innovation  intro- 
duced by  Garwick  is  the  status  variable.   This  is  essentially  a  global 
variable  stack.   Status  variables  are  declared  external  to  a  routine,  but 
are  stored  in  a  stack  as  are  local  variables.   Their  current  values  are 
copied  back  and  forth  as  the  stack  is  retreated  or  advanced,  but  when  back- 
tracking occurs,  status  variables  regain  the  values  they  had  at  the  point 
when  the  backtracking  was  prepared.   Pointers  to  the  input  string  is  a  par- 
ticular application. 

The  GARGOYLE  compiler  thus  allows  the  compiler  writer  to  avoid 
much  of  the  "housekeeping"  routines.   Also,  because  the  writer  must  construct 
the  compiler  out  of  blocks  which  are  largely  independent,  he  begins  to  think 
about  the  program  in  a  modular  way.   Perhaps  of  even  greater  interest  in 
some  applications  is  the  high  efficiency  of  a  compiler  written  in  this 
manner. 

The  GARGOYLE  system  has  been  implemented  on  a  CDC  360O.   It  has 
been  used  successfully  to  construct  itself  and  to  construct  an  ALGOL  60 
compiler. 
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E.    EULER 

The  EULER  papers  by  Wirth  and  Weber  [36]  describe  both  a  language 
that  is  an  extension  to  ALGOL,  and  a  programming  system  for  the  formal 
definition  of  a  programming  language  and  its  compiler.   It  is  the  latter  to 
which  we  refer  here.   The  system  allows  for  the  description  of  an  essentially* 
phrase  structure  precedence  language  using  Backus-Naur  Form  productions. 
The  semantic  content  corresponding  to  each  production  can  be  expressed  in  any 
available  programming  language.   The  algorithm  for  processing  the  productions 
constructs  a  precedence  relation  between  each  possible  pair  of  elements  of 
the  input  string  of  language  L.   These  relations  unambiguously  establish  the 
production  whose  right  side  is  to  be  matched  with  the  input.   Thus  a  moder- 
ately efficient  compiler  can  be  produced  directly  from  the  formal  definition 
of  the  language.   The  reduced  efficiency  of  this  compiler  seems  to  result 
mainly  from  an  arbitrary  search  over  the  productions  at  each  identification. 
If  this  inefficiency  were  annoying,  a  stack  or  sorting  technique  could 
probably  be  implemented  to  significantly  reduce  this  search. 

In  reviewing  the  systems  discussed  above  and  by  Burkhardt  [6]  one 
wonders  if  this  area,  too,  is  likely  to  result  in  a  proliferation  of 
languages.   Certainly  these  projects  do  not  seem  to  be  all  heading  in  the 
same  direction.   One  is  concerned  with  getting  up  a  compiler  quickly  without 
regard  to  its  efficiency  or  the  optimization  of  the  output,  another  is 
concerned  with  both  its  efficiency  and  the  optimization  of  the  output,  another 
is  primarily  concerned  with  a  formal  description  of  the  language.   Some  are 
more  restrictive  in  their  areas  of  application  than  others. 

At  the  present  time,  perhaps  all  these  systems  are  needed  to  meet 
our  varying  needs.   For  languages  with  wide  use,  efficiency  is  still  of  some 
importance,  although  the  diminishing  cost  of  hardware  is  reducing  the  effects 
of  inefficiency.   For  experimental  languages,  the  ability  to  get  a  system 
operating  with  ease  is  most  important,  and  efficiency  is  of  little  if  any 
concern. 


The  context  dependency  of  translation  arising  from  declaratives  is 
handled  by  a  careful  choice  of  right  and  left  recursive  productions  and 
appropriate  use  of  stacks.   By  such  techniques,  any  finite  number  of 
context  sensitive  classes  can  clearly  be  translated. 
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5 .   BOOTSTRAPPING 


The  subject  of  universal  translators  -would  hardly  "be  complete 
without  at  least  mentioning  the  bootstrap  concept.   If  a  metalanguage  can 
describe  any  translator,  then  it  can  describe  itself  or  another  meta- 
linguistic processor.   Thus  if  a  simple  compiler-compiler  system  is 
implemented,  it  has  the  ability  to  assist  in  its  own  improvement.   This  is 
illustrated  in  Figure  k   where  we  will  suppose  that  L2  is  an  improved  version 
of  LI,  and  that  MA.  is  the  machine  assembly  language  output.   We  first  use 
the  power  of  language  LI  to  describe  language  L2,  then  compile  this  processor 
to  produce  a  processor  for  L2.   The  second  line  of  the  figure  assembles  the 
L2  translator.   Furthermore,  if  L2  is  more  powerful  in  the  sense  that  it  can 
describe  a  more  efficient  translation  than  LI,  this  process  may  be  repeated 
using  L2  to  describe  itself  more  efficiently  than  it  could  be  described  by 
LI.   This  whole  process  can  be  repeated  almost  indefinitely  simply  by  making 
small  patches  to  the  metalanguage  description  of  itself. 
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COMPUTER    M 


COMPUTER  M 


L2,  MA.M 


(L2.MA.LI  )*(LI,MA,M)->(L2fMA,MA) 
(L2tMA,MA)*(MA,M,M)-^(L2,MA,  M) 


Figure  k.      A  Bootstrap  Process 
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6.   STATE  OF  THE  ART 


Generally  each  of  the  systems  described  above  has  simplified  the 
art  of  compiler  writing  by  a  condensation  of  the  relevant  parts  of  the 
system.   The  beginning  of  a  theory  of  what  we  have  been  doing  is  slowly 
emerging.   Condensation  and  removal  of  redundant  procedures  in  compiler 
writing  has  significantly  reduced  both  the  man-months  required  for  this 
activity  and  the  errors  introduced  into  its  programming.   Thus,  although 
the  systems  programmer's  job  may  be  easier ,  he  has  hardly  relinquished  it 
or  made  it  comprehensible  to  others. 

Interestingly,  none  of  the  systems  described  here  have  discussed 
the  problems  of  incremental  compiling  for  a  time-sharing  system,  although 
the  generality  of  several  of  these  systems  will  certainly  accomodate  this 
new  demand. 

The  trend  of  a  few  years  ago  to  build  quick-and-dirty  one  pass 
compilers  has  clearly  recessed,  although  the  usefulness  of  one  pass  compilers 
has  been  firmly  established.   Systems  designed  to  accomodate  the  description 
of  multiple  pass  compilers  are  few,  however,  and  the  techniques  of  these 
designs  are  yet  indistinct.   A  recent  paper  [3^]  describes  a  method  for 
specifying  the  allocation  of  storage  in  multiple  pass  compilers. 
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7.   THE  METALANGUAGE 


Central  to  the  problem  of  compiler-compilers  is  the  selection 
of  an  appropriate  metalanguage.   To  determine  the  desired  features  of  a 
metalanguage,  let  us  first  consider  the  reasons  for  the  existence  of 
present  difficulties. 

In  designing  a  computer  language  L,  and  in  writing  a  translator, 
a  great  deal  of  flexibility  arises.   This  flexibility  is,  in  a  sense,  both 
the  power  and  the  downfall  of  persons  engaged  in  these  activities.   It  is 
the  power  because  it  is  just  this  flexibility  that  makes  possible  syntactic 
constructions  with  semantic  content  too  complicated  and  too  detailed  to  be 
expressed  concisely  in  a  natural  language,  and  often  difficult  to  express 
in  classical  mathematical  languages.   It  is  the  downfall  because  it  is 
this  flexibility  that  requires  the  specification  of  a  large  amount  of  detail. 
Basically,  then,  the  problem  of  the  design  of  a  metalanguage  U  is  to  retain 
nearly  all  the  flexibility  while  rejecting,  or  at  least  reducing,  the 
necessity  for  the  detail. 

Several  approaches  are  immediately  apparent.   Among  these  are: 

A.  A  machine -independent  specification  of  at  least  a  few  of 
the  usual  computer  operations,  by  the  use  of  macros,  for 
example ; 

B.  Condensation  of  at  least  part  of  the  detail  into  a  more 
abstract  forms,  such  as  Backus -Naur  Form; 

C.  Treatment  of  at  least  part  of  the  detail  as  a  function, 
subprogram,  block  structure,  etc.,  as  in  ALGOL; 

D.  Treatment  of  the  detail  as  an  addend  or  patch  to  some 
basic  structure,  as  in  editing  routines. 

It  seems  difficult,  and  it  may  be  impossible,  to  avoid  the  detailed 
specifications  in  the  design  of  a  language  and  in  writing  of  a  translator. 
If  this  is  true,  the  solution  to  the  problems  must  be  in  approaches  (C)  and 
(D). 
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In  designing  a  metalanguage  and  a  universal  translator  system, 
therefore,  the  following  features  should  be  considered  for  incorporation 
into  the  system. 

A.  It  should  he  possible  to  describe  the  execution  of  an 
unbounded  number  of  inspections  of  the  input  string.   In 
other  words,  the  system  should  be  capable  of  describing  as 
many  "passes"  of  the  input  language  as  necessary  for  the 
optimum  performance  of  the  task. 

B.  It  should  be  possible  to  extend  the  context  analysis  as 


far  as  necessary--an  unbounded  context  analysis.   That  is, 
it  should  be  possible  to  look  as  far  ahead  or  behind  as 
necessary  for  key  words  affecting  the  translation.   (These 
two  features  would  significantly  extend  the  class  of 
languages  that  can  be  translated. ) 

Co   There  should  be  provisions  for  defining  the  interpretation 
of  those  constructions  of  the  metalanguage  which  are 
generally  machine  independent.   Perhaps  these  definitions 
should  be  given  in  the  assembly  language  of  a  particular 
machine. 

D.  The  resulting  translator  should  have  an  external  block 
structure  with  a  minimum  of  interaction  between  blocks.   It 
should  be  possible  to  construct  nearly  independent  blocks 
of  the  translator  for  each  logical  feature  of  the  input 
language . 

E.  It  should  be  possible  to  draw  blocks  from  a  library,  as 
desired,  and  if  desired,  to  patch  these  blocks  as  they  are 
drawn.   It  should  also,  of  course,  be  possible  to  add, 
delete,  or  alter  blocks  in  the  library. 

F.  It  should  be  possible  to  write  each  block  so  that  at  least 
a  simple  class  of  language  features  can  be  added  without 
an  actual  patch  to  the  block  itself. 
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At  the  present  time  it  seems  feasible  to  me  that  a  system 
incorporating  these  features  can  "be  designed.   With  these  features  it 
may  "be  possible  for  the  user  to  participate  fully  in  the  design  of  his 
language  L  and  in  the  writing  of  its  compiler. 

In  short,  a  compiler -compiler  is  a  useful  system  and  rather 
important  to  the  continuing  development  of  programming  languages.   A 
universal  language  would  be  fine  if  we  really  knew  how  to  design  one.   It 
seems  evident  that  mutual  consent  at  conference  tables  is  inadequate. 
Either  a  compiler-compiler  or  a  universal  language  would  be  more  easily 
developed  if  we  had  the  other,  or  if  we  had  a  firm  theoretical  basis  for 
the  design  of  either.   In  such  circumstances  science  has  historically  evolved, 
and  this  seems  to  be  the  appropriate  method  here.   Finally,  a  compiler- 
compiler  appears  to  be  a  very  difficult  problem  but  one  capable  of  a 
reasonable  solution. 
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